{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structure Learning in Bayesian Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms:\n", "1. PC with stable and parallel variants.\n", "2. Hill-Climb Search\n", "3. Exhaustive Search\n", "\n", "For PC the following conditional independence test can be used:\n", "1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test)\n", "2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression)\n", "3. G-squared (https://en.wikipedia.org/wiki/G-test)\n", "4. Log-likelihood (https://en.wikipedia.org/wiki/G-test)\n", "5. Freeman-Tuckey (Read, Campbell B. \"Freeman—Tukey chi-squared goodness-of-fit statistics.\" Statistics & probability letters 18.4 (1993): 271-278.)\n", "6. Modified Log-likelihood\n", "7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)\n", "8. Cressie Read (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464)\n", "9. Power Divergence (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)\n", "\n", "For Hill-Climb and Exhausitive Search the following scoring methods can be used:\n", "1. K2 Score\n", "2. BDeu Score\n", "3. Bic Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate some data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from itertools import combinations\n", "\n", "import networkx as nx\n", "from sklearn.metrics import f1_score\n", "\n", "from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch\n", "from pgmpy.estimators import K2Score\n", "from pgmpy.utils import get_example_model\n", "from pgmpy.sampling import BayesianModelSampling" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 544.13it/s]\n" ] }, { "data": { "text/html": [ "
\n", " | HISTORY | \n", "CVP | \n", "PCWP | \n", "HYPOVOLEMIA | \n", "LVEDVOLUME | \n", "LVFAILURE | \n", "STROKEVOLUME | \n", "ERRLOWOUTPUT | \n", "HRBP | \n", "HREKG | \n", "... | \n", "MINVOLSET | \n", "VENTMACH | \n", "VENTTUBE | \n", "VENTLUNG | \n", "VENTALV | \n", "ARTCO2 | \n", "CATECHOL | \n", "HR | \n", "CO | \n", "BP | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "TRUE | \n", "LOW | \n", "LOW | \n", "FALSE | \n", "LOW | \n", "TRUE | \n", "LOW | \n", "FALSE | \n", "HIGH | \n", "NORMAL | \n", "... | \n", "NORMAL | \n", "NORMAL | \n", "LOW | \n", "ZERO | \n", "ZERO | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "LOW | \n", "LOW | \n", "
1 | \n", "FALSE | \n", "NORMAL | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "HIGH | \n", "HIGH | \n", "... | \n", "NORMAL | \n", "NORMAL | \n", "LOW | \n", "ZERO | \n", "LOW | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "
2 | \n", "FALSE | \n", "NORMAL | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "HIGH | \n", "HIGH | \n", "... | \n", "LOW | \n", "LOW | \n", "ZERO | \n", "ZERO | \n", "ZERO | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "
3 | \n", "FALSE | \n", "NORMAL | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "HIGH | \n", "FALSE | \n", "HIGH | \n", "HIGH | \n", "... | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "LOW | \n", "HIGH | \n", "LOW | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "
4 | \n", "FALSE | \n", "NORMAL | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "NORMAL | \n", "FALSE | \n", "HIGH | \n", "HIGH | \n", "... | \n", "NORMAL | \n", "NORMAL | \n", "ZERO | \n", "HIGH | \n", "LOW | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "HIGH | \n", "LOW | \n", "
5 rows × 37 columns
\n", "